Intro
We talk often about player tendencies to achieve certain outcomes… a
hitter’s spray chart, a hitter’s launch angle breakdown, a pitcher’s
pitch distribution, a pitcher’s fly ball to ground ball ratio, etc.
However, there is a common event that often gets overlooked in baseball
analysis: the foul ball. Every now and then, a stat about foul balls
will emerge, such as how https://www.si.com/more-sports/2012/06/05/joey-vottoreds,
but more attention is lent to the balls that are put in play, watched or
whiffed at.
ggplot(
dbGetQuery(bq, 'SELECT
description,
count / total `%`
FROM
(
SELECT
1 foo,
description,
COUNT(*) count
FROM
(
SELECT
REGEXP_REPLACE(description, "_*blocked_*", "") description,
FROM
`mlb.statcast_pitches`
WHERE
game_year = 2023 AND game_type = "R"
)
GROUP BY
description
)
JOIN
(
SELECT
1 foo,
COUNT(*) total
FROM
`mlb.statcast_pitches`
WHERE
game_year = 2023 AND game_type = "R"
)
USING
(foo);'),
aes(reorder(description, -`%`), `%`)
) +
geom_bar(stat = "identity", fill = "#0099f9") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "2023 Pitch Results", x = "", y = "Frequency") +
scale_y_continuous(labels = percent, limits = c(0, 0.4)) +
geom_text(aes(label = percent(`%`, accuracy = 0.1)), vjust = -0.5, size = 3)

Foul balls make up nearly 18% of pitch outcomes and can be
interpreted in a number of different ways. With less than two strikes… -
the pitch wasn’t really what the batter was looking for, but he swung
anyway. - the batter’s swing or timing was slightly off. With two
strikes… - the batter is just trying to stay alive with two strikes. -
the pitch wasn’t located well enough or wasn’t deceptive enough to miss
the bat entirely.
Fouls can feel like a neutral outcome when they happen on the field,
but they are almost always a positive outcome for either the pitcher or
the hitter. For example, with less than two strikes, a foul ball is a
positive outcome for the pitcher. Certain foul balls, like a towering
drive that hooks foul, can indicate that the batter has the upper hand,
but it still counts like any other strike, and the pitcher can breath a
sigh of relief. When there are two strikes, a foul ball is always going
to feel like the batter held strong, and the pitcher is annoyed that he
has to throw another pitch. In short, a foul ball is a win for the
pitcher with less than 2 strikes, and it is a win for the hitter with 2
strikes. With this in mind, I wanted to split hitter/pitcher foul ball
tendencies based on the count.
batters.2023.df <- statcast_leaderboard$batter %>%
filter(pa >= 200) %>%
mutate(Name = sub("(.+),\\s(.+)","\\2 \\1", `last_name, first_name`)) %>%
select(-year, -`last_name, first_name`, -est_ba_minus_ba_diff,
-est_slg_minus_slg_diff, -est_woba_minus_woba_diff) %>%
rename(PA = pa) %>%
# All pitches
inner_join(
dbGetQuery(bq, 'SELECT
batter player_id,
COUNTIF(description IN ("swinging_strike", "foul_tip",
"swinging_strike_blocked")) / COUNT(*) `0|1-Strike Whiff %`,
COUNTIF(description = "foul") / COUNT(*) `0|1-Strike Foul %`
FROM
`mlb.statcast_pitches`
WHERE
game_year = 2023 AND game_type = "R" AND strikes < 2 AND
/* Swing */
description IN ("foul", "hit_into_play", "swinging_strike",
"foul_tip", "swinging_strike_blocked")
GROUP BY
batter;'),
by = "player_id"
) %>%
# 2-Strikes
inner_join(
dbGetQuery(bq, 'SELECT
batter player_id,
COUNTIF(description IN ("swinging_strike", "foul_tip",
"swinging_strike_blocked")) / COUNT(*) `2-Strike Whiff %`,
COUNTIF(description = "foul") / COUNT(*) `2-Strike Foul %`
FROM
`mlb.statcast_pitches`
WHERE
game_year = 2023 AND game_type = "R" AND strikes = 2 AND
/* Swing */
description IN ("foul", "hit_into_play", "swinging_strike",
"foul_tip", "swinging_strike_blocked")
GROUP BY
batter;'),
by = "player_id"
)
pitchers.2023.df <- statcast_leaderboard$pitcher %>%
filter(pa >= 200) %>%
mutate(Name = sub("(.+),\\s(.+)","\\2 \\1", `last_name, first_name`)) %>%
select(-year, -`last_name, first_name`, -est_ba_minus_ba_diff,
-est_slg_minus_slg_diff, -est_woba_minus_woba_diff) %>%
rename(TBF = pa) %>%
# All pitches
inner_join(
dbGetQuery(bq, 'SELECT
pitcher player_id,
COUNTIF(description IN ("swinging_strike", "foul_tip",
"swinging_strike_blocked")) / COUNT(*) `0|1-Strike Whiff %`,
COUNTIF(description = "foul") / COUNT(*) `0|1-Strike Foul %`
FROM
`mlb.statcast_pitches`
WHERE
game_year = 2023 AND game_type = "R" AND strikes < 2 AND
/* Swing */
description IN ("foul", "hit_into_play", "swinging_strike",
"foul_tip", "swinging_strike_blocked")
GROUP BY
pitcher;'),
by = "player_id"
) %>%
# 2-Strikes
inner_join(
dbGetQuery(bq, 'SELECT
pitcher player_id,
COUNTIF(description IN ("swinging_strike", "foul_tip",
"swinging_strike_blocked")) / COUNT(*) `2-Strike Whiff %`,
COUNTIF(description = "foul") / COUNT(*) `2-Strike Foul %`
FROM
`mlb.statcast_pitches`
WHERE
game_year = 2023 AND game_type = "R" AND strikes = 2 AND
/* Swing */
description IN ("foul", "hit_into_play", "swinging_strike",
"foul_tip", "swinging_strike_blocked")
GROUP BY
pitcher;'),
by = "player_id"
)
2023 Batters (min. 200 PAs)
10 Highest
kable(
batters.2023.df %>%
select(Name, PA, `0|1-Strike Foul %`) %>%
arrange(desc(`0|1-Strike Foul %`)) %>%
mutate(
`0|1-Strike Foul %` = percent(`0|1-Strike Foul %`, accuracy = 0.01)
) %>%
head(10)
) %>% kable_styling(bootstrap_options = c("bordered", "striped"))
|
Name
|
PA
|
0|1-Strike Foul %
|
|
Jake Cronenworth
|
522
|
48.78%
|
|
Isaac Paredes
|
571
|
48.43%
|
|
Bo Naylor
|
230
|
46.54%
|
|
Ozzie Albies
|
660
|
45.89%
|
|
Nathaniel Lowe
|
724
|
45.59%
|
|
Gio Urshela
|
228
|
45.05%
|
|
Zach McKinstry
|
518
|
45.03%
|
|
Max Kepler
|
491
|
44.52%
|
|
Adley Rutschman
|
687
|
44.26%
|
|
Pavin Smith
|
228
|
44.20%
|
kable(
batters.2023.df %>%
select(Name, PA, `2-Strike Foul %`) %>%
arrange(desc(`2-Strike Foul %`)) %>%
mutate(
`2-Strike Foul %` = percent(`2-Strike Foul %`, accuracy = 0.01)
) %>%
head(10)
) %>% kable_styling(bootstrap_options = c("bordered", "striped"))
|
Name
|
PA
|
2-Strike Foul %
|
|
Yordan Alvarez
|
496
|
46.95%
|
|
Sal Frelick
|
223
|
46.75%
|
|
Geraldo Perdomo
|
495
|
46.02%
|
|
Justin Turner
|
626
|
45.81%
|
|
Ty France
|
665
|
45.59%
|
|
Isiah Kiner-Falefa
|
361
|
45.55%
|
|
Cody Bellinger
|
556
|
45.43%
|
|
Daulton Varsho
|
581
|
45.11%
|
|
Will Smith
|
554
|
44.87%
|
|
Alec Burleson
|
347
|
44.87%
|
10 Lowest
kable(
batters.2023.df %>%
select(Name, PA, `0|1-Strike Foul %`) %>%
arrange(`0|1-Strike Foul %`) %>%
mutate(
`0|1-Strike Foul %` = percent(`0|1-Strike Foul %`, accuracy = 0.01)
) %>%
head(10)
) %>% kable_styling(bootstrap_options = c("bordered", "striped"))
|
Name
|
PA
|
0|1-Strike Foul %
|
|
Christian Bethancourt
|
332
|
29.48%
|
|
William Contreras
|
611
|
29.57%
|
|
Eloy Jiménez
|
489
|
29.92%
|
|
Javier Báez
|
547
|
30.26%
|
|
Kevin Kiermaier
|
408
|
30.29%
|
|
Aaron Judge
|
458
|
30.75%
|
|
Jose Siri
|
364
|
30.82%
|
|
Joey Wiemer
|
410
|
30.87%
|
|
Jordan Walker
|
465
|
30.91%
|
|
Luke Raley
|
406
|
31.07%
|
kable(
batters.2023.df %>%
select(Name, PA, `2-Strike Foul %`) %>%
arrange(`2-Strike Foul %`) %>%
mutate(
`2-Strike Foul %` = percent(`2-Strike Foul %`, accuracy = 0.01)
) %>%
head(10)
) %>% kable_styling(bootstrap_options = c("bordered", "striped"))
|
Name
|
PA
|
2-Strike Foul %
|
|
Jazz Chisholm Jr.
|
383
|
23.53%
|
|
Mark Vientos
|
233
|
25.32%
|
|
Jose Siri
|
364
|
27.57%
|
|
Eloy Jiménez
|
489
|
27.94%
|
|
Harrison Bader
|
344
|
28.64%
|
|
Brett Baty
|
389
|
28.69%
|
|
Francisco Alvarez
|
423
|
28.87%
|
|
Christopher Morel
|
429
|
29.08%
|
|
Mickey Moniak
|
323
|
29.51%
|
|
Paul DeJong
|
400
|
29.64%
|
# https://plotly.com/ggplot2/configuration-options/
# Highlight POIs: https://thiyanga.netlify.app/post/scatterplot/
batter.scatter.plot.df <- batters.2023.df %>%
mutate(diff = `2-Strike Foul %` - `0|1-Strike Foul %`,
color = as.factor(
ifelse(diff >= 0.06, "#025189",
ifelse(diff >= 0.03, "#0c9cb4",
ifelse(diff >= 0, "#94c280",
ifelse(diff >= -0.03, "#f1c359",
ifelse(diff >= -0.06, "#d03f2e",
"#982123"))))))) %>%
select(Name, `2-Strike Foul %`, `0|1-Strike Foul %`, color)
ggplotly(
ggplot(
batter.scatter.plot.df,
aes(x = `0|1-Strike Foul %`, y = `2-Strike Foul %`,
text = paste(Name, "\n0/1-Strike Foul %: ",
percent(`0|1-Strike Foul %`, accuracy = 0.1),
"\n2-Strike Foul %: ",
percent(`2-Strike Foul %`, accuracy = 0.1), sep = ""))) +
geom_point(aes(color = color)) +
scale_colour_manual(values = levels(batter.scatter.plot.df$color)) +
labs(title = "2023 Batters (min. 200 PAs)") +
scale_x_continuous(labels = percent) +
scale_y_continuous(labels = percent) +
geom_abline(linetype = "dotted") +
theme(legend.position = "none"),
tooltip = "text") %>%
layout(annotations = list(
list(text = "Hover over any point for player details", x = 0.437, y = 0.462,
font = list(size = 10)),
list(text = "y = x", x = 0.484, y = 0.476, font = list(size = 10),
showarrow = FALSE))) %>%
config(displayModeBar = FALSE)
There is a positive correlation between 0/1-Strike Foul % and
2-Strike Foul % (correlation coefficient of 0.42), but substantial
variance exists, too. Take Yordan Alvarez, for example. With 0 or 1
strike, he fouls off only 34.1% of pitches, but that number increases to
46.9% with 2 strikes, which is the highest in all of MLB.
ggplotly(
ggplot(
dbGetQuery(bq, 'WITH
team_foul_pct
AS
(
SELECT
CASE inning_topbot WHEN "top" THEN away_team ELSE home_team END team,
strikes,
COUNTIF(description = "foul") fouls,
COUNT(*) pitches,
FROM
`mlb.statcast_pitches`
WHERE
game_year = 2023 AND game_type = "R" AND
description IN ("foul", "hit_into_play", "swinging_strike",
"foul_tip", "swinging_strike_blocked") /* Swing */
GROUP BY
team,
strikes
)
SELECT
team,
`0|1-Strike Foul %`,
`2-Strike Foul %`
FROM
(
SELECT
team,
SUM(fouls) / SUM(pitches) `0|1-Strike Foul %`
FROM
team_foul_pct
WHERE
strikes < 2
GROUP BY
team
)
JOIN
(
SELECT
team,
SUM(fouls) / SUM(pitches) `2-Strike Foul %`
FROM
team_foul_pct
WHERE
strikes = 2
GROUP BY
team
)
USING
(team);'),
aes(x = `0|1-Strike Foul %`, y = `2-Strike Foul %`,
text = paste(team, "\n0/1-Strike Foul %: ",
percent(`0|1-Strike Foul %`, accuracy = 0.1),
"\n2-Strike Foul %: ",
percent(`2-Strike Foul %`, accuracy = 0.1), sep = ""))) +
geom_point(col = "#69b3a2") +
labs(title = "2023 Teams (Batting)") +
scale_x_continuous(labels = percent) +
scale_y_continuous(labels = percent) +
geom_abline(linetype = "dotted") +
theme(legend.position = "none"),
tooltip = "text") %>%
layout(annotations = list(
list(text = "Hover over any point for team details", x = 0.377, y = 0.3965,
font = list(size = 10)),
list(text = "y = x", x = 0.388, y = 0.39, font = list(size = 10),
showarrow = FALSE))) %>%
config(displayModeBar = FALSE)